3 research outputs found
Beyond Reward: Offline Preference-guided Policy Optimization
This study focuses on the topic of offline preference-based reinforcement
learning (PbRL), a variant of conventional reinforcement learning that
dispenses with the need for online interaction or specification of reward
functions. Instead, the agent is provided with fixed offline trajectories and
human preferences between pairs of trajectories to extract the dynamics and
task information, respectively. Since the dynamics and task information are
orthogonal, a naive approach would involve using preference-based reward
learning followed by an off-the-shelf offline RL algorithm. However, this
requires the separate learning of a scalar reward function, which is assumed to
be an information bottleneck of the learning process. To address this issue, we
propose the offline preference-guided policy optimization (OPPO) paradigm,
which models offline trajectories and preferences in a one-step process,
eliminating the need for separately learning a reward function. OPPO achieves
this by introducing an offline hindsight information matching objective for
optimizing a contextual policy and a preference modeling objective for finding
the optimal context. OPPO further integrates a well-performing decision policy
by optimizing the two objectives iteratively. Our empirical results demonstrate
that OPPO effectively models offline preferences and outperforms prior
competing baselines, including offline RL algorithms performed over either true
or pseudo reward function specifications. Our code is available on the project
website: https://sites.google.com/view/oppo-icml-2023
RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph
Developing robotic intelligent systems that can adapt quickly to unseen wild
situations is one of the critical challenges in pursuing autonomous robotics.
Although some impressive progress has been made in walking stability and skill
learning in the field of legged robots, their ability to fast adaptation is
still inferior to that of animals in nature. Animals are born with massive
skills needed to survive, and can quickly acquire new ones, by composing
fundamental skills with limited experience. Inspired by this, we propose a
novel framework, named Robot Skill Graph (RSG) for organizing massive
fundamental skills of robots and dexterously reusing them for fast adaptation.
Bearing a structure similar to the Knowledge Graph (KG), RSG is composed of
massive dynamic behavioral skills instead of static knowledge in KG and enables
discovering implicit relations that exist in be-tween of learning context and
acquired skills of robots, serving as a starting point for understanding subtle
patterns existing in robots' skill learning. Extensive experimental results
demonstrate that RSG can provide rational skill inference upon new tasks and
environments and enable quadruped robots to adapt to new scenarios and learn
new skills rapidly
Impact of extracorporeal membrane oxygenation in immunocompetent children with severe adenovirus pneumonia
Abstract Background Severe adenovirus (Adv.) pneumonia can cause significant mortality in young children. There has been no worldwide consensus on the impact of extracorporeal membrane oxygenation (ECMO) in immunocompetent children with severe Adv. pneumonia. This study aimed to assess the impact of ECMO in immunocompetent children with severe Adv. pneumonia. Methods This study evaluated the medical records of 168 hospitalized children with severe Adv. pneumonia at the Guangzhou Women and Children’s Medical Center between 2019 and 2020.Nineteen patients in the ECMO group and 149 patients in the non-ECMO group were enrolled. Results Between these two groups, there were no differences in host factors such as sex, age (all P > 0.05). Significant differences were observed in shortness of breath/increased work of breathing; cyanosis; seizures; tachycardia; the partial pressure of oxygen in arterial blood (PO2); the ratio of PaO2 to the fraction concentration of oxygen in inspired air (FiO2; P/F); white blood cell, lymphocyte, monocytes, lactate dehydrogenase (LDH), serum albumin, and procalcitonin levels; and, pulmonary consolidation (all P < 0.05). There were significant differences in the parameters of mechanical ventilation (MV) therapy and complications such as respiratory failure, acute respiratory distress syndrome, septic shock, length of hospitalization, and death (all P < 0.05). The maximum axillary temperatures, respiratory rates, heart rates and LDH levels after receiving ECMO were significantly lower than those before ECMO (all P < 0.05). Additionally, SPO2, PO2, and P/F were significantly higher than those before ECMO (all P < 0.05). In MV therapy, FiO2, PIP, and PEEP were significantly lower than those before ECMO (all P < 0.05). Conclusions In our study, the clinical conditions of the patients in the ECMO group were much more severe than those in the non-ECMO group. Our study showed that ECMO might be beneficial for the patients with severe Adv. pneumonia